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ABSTRACT 

Motivation: Systems biology employs mathematical modelling to 
further our understanding of biochemical pathways. Since the 
amount of experimental data on which the models are parameterized 
is often limited, these models exhibit large uncertainty in both 
parameters and predictions. Statistical methods can be used to 
select experiments that will reduce such uncertainty in an optimal 
manner. However, existing methods for optimal experiment design 
(OED) rely on assumptions that are inappropriate when data are 
scarce considering model complexity. 

Results: We have developed a novel method to perform OED for 
models that cope with large parameter uncertainty. We employ a 
Bayesian approach involving importance sampling of the posterior 
predictive distribution to predict the efficacy of a new measurement 
at reducing the uncertainty of a selected prediction. We demonstrate 
the method by applying it to a case where we show that specific 
combinations of experiments result in more precise predictions. 
Availability and implementation: Source code is available at: 
http://bmi.bmt.tue.nl/sysbio/software/pua.html 
Contact: |j.vanlier@tue.nl|IN.A.W.v.Riel@tue.nll 
Supplementary information: Supplementary data are available at 
Bioinformatics online. 
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1 INTRODUCTION 

Computational models can be used to predict (un)measured 
behaviour or system responses and formalize hypotheses in a 
testable manner. To be able to make predictions parameters are 
required. Despite the development of new quantitative experimental 
techniques, data are often relatively scarce. Consequently the 
modeller is faced with a situation where large regions of parameter 
space can describe the measured data to an acceptab le degree 
dBrannmark et a/.ll2010l;ICalderhead and Girolamill201ll: Girolam i 



and Calderhead. l201ll:lHasenauer et a/.Ll2010l:lRaue et a/.Ll2Q09b . 
This is not a problem when the predictions required for testing the 
hypothesis (whic h we shall refer to as predictions of interest) are 
well constrained jCedersund and Rollll2009l:lGomez-Cabrero et al. 



2011 



2011 



iGutenkunst et all 20071 : ItCreutz et a/.Ll201ll : lTiemann et al. 
). When this is not the case more data will be required. Optimal 



experiment design (OED) methods can be used to determine which 
experiments would be most useful in order to perform statistical 
inference. Classical design criteri a are often based on l i neariz ation 
around a best fit parameter set (iKreutz and Timmerl l2009h and 
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pertain to effectively constraining the pa rameters dFaller et all 
120031: iRodriguez-Fernandez et all 120061) or predictions ( Casey 
et al. , 120071) . However, when data are scarce considering the model 
complexity or t he model is strongly non-linear, such methods are 
not appropriate (IKreutz et a/.U2011l) . This makes investigating the 
role of parameter uncertainty in OED a relevant topic to explore. 
We propose a method for experimental design that overcomes these 
issues by adopting a probabilistic approach which incorporates 
prediction uncertainty. Our method enables the modeller to target 
experimental efforts in order to selectively reduce the uncertainty of 
predictions of interest. Using our approach, multiple experiments 
can be designed simultaneously revealing potential benefits that 
might arise from specific combinations of experiments. 

We focus on biochemical networks that can be modelled using 
a system of ordinary differential equations. These models comprise 
of equations f(x(t),u(t),p) which contain parameters p (constant 
in time), inputs u(t) and state variables x(t). Given a set of 
parameters, inputs and initial conditions x(0) these equations can 
subsequently be simulated. Measurements y(t) are performed on a 
subset and/or a combination of the total number of states in the 
model. Measurements are hampered by measurement noise £ while 
many techniques used in biology (e.g. wes tern blotting) neces sitate 
the use of scaling and offset parameters q (IKreutz et al. I l2007h . We 
define 6 as 6 = {p,q,xo}, which lists all the required variables to 
simulate the model. 



x(t)=f(x(t),u(t),p) 

y(t) = g(x(t),q)+h) 
x(0)=x 0 



(1) 

(2) 
(3) 



In order to perform inference and experiment design an error 
model is required. For ease of notation we shall demonstrate our 
method using a Gaussian error model. If we consider M time series 
of length N\, N2, Nm hampered by such noise, we obtain 
Equation (4} for the probability density function of the output 
data. Here y t is the true system with true parameters 0 t , where 
<7ij indicates the SD of a specific data point and K serves as a 
normalization constant. 



M Ni 

P(y\o t ) = YlY\P(yi(tj)A) 

i=ij=i 



= Ke i= v- 



(4) 



(5) 



Using Bayes' theorem, we obtain an expre ssion for th e 
posterior probability distribution over the parameters 
The posterior probability distribution is given by normalizing 
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Equation (4} multiplied with the prior to a unit area. Here P(y\6 t ) 
corresponds to the probability of observing dataset y given the 
true parameters 0 t . Both computational as well as methodological 
advances have made Markov Chain Monte Carlo (MCMC) an 
attractive option for obtaining samples from such a distri bution 
(lGeverlll99ilGirolami and CalderheadLl201ll : lKlinkeil2009h . 

Given a sample of the posterior parameter distribution, predictions 
can be made by simulating the model for each of the parameter 
sets. The distribution of such predictions shall be referred to as the 
posterior predictive distribution (PPD) and reflects their uncertainty. 
Since all of these predictions are linked via the parameter 
distributions, the relations between the different predictions can be 
exploited for experimental design. By considering the effects of a 
new measurement on the PPD, it is possible to predict how useful an 
experiment would be. Our approach consists of a number of steps. 
First, we briefly mention how to compute the PPD. Subsequently 
we detail how to compute the efficacy of the new measurement. In a 
third step this measure is used for experimental design. We conclude 
by demonstrating the method by applying it to a case study. 

2 APPROACH 

In order to overcome the limitations of existing OED methods, a sampling- 
based approach for experimental design is proposed. This approach consists 
of four consecutive steps which we shall outline below. 

2.1 Step 1. Computation of the posterior parameter 
distribution 

The first step in the analysis is computing the posterior parameter distribution 
of the model based on the available data: 



P0\y D )cxP(y D \d)P(d) 



(6) 



Here probability reflects a degree of belief and prior knowledge regarding 
the parameters is included in the form of priors, and P(y D \6) denotes 
the conditional probability of the data given the model parameters. The 
unnormalized form of this function is often referred to as the likelihood 
function. Furthermore, P(6) refers to the prior distribution of the parameters. 

In order to sample from the posterior distribution we employ a MCMC 
method known as the Metropolis-Hastings algorithm. This algorithm 
performs a random walk through parameter space where each subsequent 
step is based on a proposal distribution (centred on the current step) and 
an acceptance criterion based on the proposal and probability densities 
at the sampled points. In brief, after an initial burn-in period (which is 
discarded), MCMC methods generate samples from probability distributions 
whose probability densities are known up to a normalizing factor. The 
Metropolis-Hastings algorithm proceeds as follows: 

(1) Generate a sample 6 n+ i by sampling from a proposal distribution 
based on the current state 6 n . 

(2) Compute the likelihood of the data L(y D \6 n+ \) and calculate 
P(e n+1 \y D ) = L(y D \e n+1 )P0 n+1 ), where P0 n+1 ) refers to the prior 
density function. 

(3) Draw a random number y from a uniform distribution between 0 and 
1 and accept the new step if y < min ( l w^ n± r^r . 0 • 

Here Q{6\ -> 62) refers to the proposal density from current parameter 
set 6\ to 62. The ratio of Q ensures detailed balance, a sufficient con dition 
for th e Markov chain to converge to the equilibrium distribution iNeall 
Il996h . It corrects for sampling biases resulting from non-symmetric proposal 
distributions and is defined as the ratio between the proposal densities 
associated with going from n to n + 1 and n+l to n. We employ an 
adaptive Gaussian proposal distribution whose covariance matrix is based on 



a qua dratic approximation to the posterior probability at the current sample 
point iGutenkunst et <2/.ll2007l) . Further details regarding the implementation 
can be found in the Supplementary Materials. 

2.2 Step 2. Determine PPDs for all candidate 
experiments 

A PPD is a distribution of predictions conditioned on the available data 
as shown in Equation (7J. A PPD is obtained by simulating the model 
(including the addition of measurement noise) for a sample of parameter 
sets from the posterior parameter distribution. We simulate a PPD for each 
candidate experiment. These PPDs link the parameters to the predictions 
and via the parameters also link predictions (across different experiments) 
to each other. The model and data constrain the dynamics of the system 
and hereby implicitly impose non-trivial relations between the different 
predictions. Therefore, the observables of candidate experiments are related 
to our prediction of interest. The next step is to exploit the relations within 
these distributions for experimental design. 



P{y\f 



D ,%)) = f 



P(y\6,u (t) )P(6\y D )d6 



(7) 



2.3 



Step 3. Predict EVR based on PPDs and 
measurement accuracies 



To be able to perform experiment design a measure of expected measurement 
efficacy is required. For this purpose, we introduce the expected variance 
reduction (EVR). Consider an independent new measurement of a specific 
prediction (observable). This new measurement is associated with an error 
model G which reflects a certain degree of uncertainty associated with the 
new experiment. If this new experiment were to be performed and a value is 
obtained, then the subsequent step would be to incorporate the new data point 
(and its associated error model) in the likelihood function and re-perform the 
MCMC This new data would subsequently constrain the posterior parameter 
distribution, hence also affecting the prediction of interest (which cannot be 
measured directly). This process is illustrated in Figure ^ 

The outcome of the experiment is known only after the experiment has 
been performed. Therefore the measured value in the error model is not 
known a priori. However, the PPD provides us with a predicted distribution of 
this value (shown in grey in Fig.Q which reflects the uncertainty associated 
with this value. Samples from the PPD can subsequently be substituted as 
'true' values in the error model. By repeating this process for every R- 
th point of our MCMC chain and averaging the result, we weight by the 
probability distribution of this predicted value. Considering that a single 
MCMC is often already computationally demanding, such a nested MCMC 
is likely not tractable. Here we propose an alternative approach. Consider the 
unnormalized densities P(y\6), P(y n \6) andP((9), respectively, corresponding 
to the density model of the data used to determine the initial posterior 
distribution, the density model for the new data point, and the parameter 
prior. Assuming that the new data point is independent of the existing data 
points we can state that P]sf(6\y,y n )(xP(y\0)P(y n \0)P(6) in order to obtain the 
following equation for the new normalized posterior |8j. In this equation, 
Z\ and Z2 denote the normalization constants of the old and new posterior, 
respectively. 

P(y\6)P(y n \6)P0) 



P N (0\y,y n ) = 



(8) 



fP(y\6)P(y n \6)P(6)dO 

P(y\6)P0) P{y\e)Piy n \e)P0) fP(y\ojP0)dd 
' fP(y\6)P0)d6 P(y\O)P0) fP(y\6)P(y n \6)P0)d6 



(9) 



= P(6\y)P(y n \6) 



fP(y\6)P0)d6 
fP(y\d)P(y n \6)P0)dd 



= P0\y)P(yJ)^- 
^2 



(10) 

This relation between the two posteriors can be exploited in order to 
compute expected values by re-weighting samples from the old posterior 
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Prediction of interest (B) 



Fig. 1. Illustration of the effect of adding a new data point on the PPD. Shown 
on the top right is the PPD at one specific time point for two predictions with 
a subset of the samples of the chain indicated with white points. The square 
denotes the location of the 'new measurement'. Prediction A refers to a 
prediction of which a new measurement can be performed (observable), 
whereas B denotes the prediction of interest. Here the grey distribution 
corresponds to the PPD before the new measurement, whereas the white 
Gaussian corresponds to the error model of the new measurement. Due to 
additional constraints imposed by this new measurement in combination with 
the old data and the model, the distribution on the hypothesis side is also 
updated in light of the new data point and shown in white. 

appropriately. Rather than running a new MCMC for every sample, we can 
use self normalized importance sampling on the predictions of the output in 
order to compute expected values. This is shown in Equation < fTTV where 
samples 0/ and Oj are taken from the old posterior distribution, T indicates 
the number of MCMC samples included in the analysis and z(0) indicates 
our quantity of interest. 



E[z\y 



P N (0\y,y n )z(e)dO-- 



■J 



P(O\y)P(y n \d)^z0)dd 



T 



P(y n \0i) 



(ID 



As mentioned before, the value of y n is not known a priori. Therefore, 
we subsequently compute such an expected value for each parameter set in 
the PPD with y n set to the predicted output value from the original PPD. 
Hence this provides a distribution of expected values considering possible 
outcomes of the experiment. The mean of these expected values then provides 
us with a prediction of the quantity of interest. The entire approach can then 
be succinctly summarized in Equation ( fT2l Here the expected value of z is 
computed, G corresponds to the error model and 0/ refers to the i-th parameter 
vector of the chain. Assuming a Gaussian error model with SD a for a new 
measurement on the output y, probability model G is given by Equation 
fl3}. Note that both input as well as output can be any quantity of interest 
(prediction or parameter) indicating the flexibility of the approach. 



1 T T 



G(t,u(t),6i,6 r ) 



j: k =iG(t,u(t),e k ,e r ) 



4^z(t,u(t)A) 



(yitMt),ei)-y(t,u(f),e r )) 

G(t, u(t), 6iX) = e 2? ~ 



(12) 



(13) 



Since the variance of a variable of interest can be computed by 
Equation < I14K we can use the aforementioned method to estimate 



this quantity. The variance reduction can then be computed as shown in 
Equation HTi where o^ ld corresponds to the posterior variance without the 
new measurement and cr^ew corresponds to the expected posterior variance 
with the new measurements taken into account. In other words, one obtains 
the mean variance reduction considering the prediction uncertainty. The 
variance reduction computed by this sampling method is referred to as the 
sampled variance reduction (SVR). 



Vm[z]=E[z 2 ]-(E[z]) 2 



Var/?=l-£ 



new 



(14) 
(15) 



2.3.1 Linear variance reduction: When the measurement error models 
and PPD can reasonably be assumed Gaussian, one can approximate the 
variance reduction by approximating the PPD between the output and the 
measurements of interest with a multivariate Gaussian distribution. First 
the PD covariance matrix fL6l is computed, where z denotes the output of 
interest and x% the b-th MCMC sample of the <2-th measurable state (without 
measurement noise), with Q and T the number of measured points and 
samples, respectively. 



zi x 

Z2 X 



V L ZT x\ 



(16) 



After performing the new measurements with given SDs the covariance 
matrix is updated according to Equation f[8V The resulting variance of the 
prediction of interest z can then be obtained as E new (l, 1). We shall refer to 
the approximated variance reduction as the linear variance reduction (LVR). 



^noise — 



0 0 

0 of 



0 0 



posterior 



4. 
+ £„, 



'+£-> r 1 

1 noise / 



(17) 



(18) 



2.4 



Step 4. Determine measurement points for optimal 
variance reduction 

The probability density model can be obtained by multiplying the error 
models for each candidate measurement. Subsequently, the space of all 
candidate measurements is sampled using Monte Carlo sampling. The 
efficacy of a specific combination of measurements is evaluated by 
computing the variance reduction, which is defined as Equation {15} . During 
this sampling stage, additional constraints which arise because of practical 
considerations can be imposed on the experimental design (simply by 
rejecting such samples). An example of this could be the inability to measure 
certain states simultaneously. The optimal experiment is then obtained by 
determining the combination of measurements that yields the maximal 
predicted variance reduction. 



3 COMPUTATIONAL METHODS 

All of the discussed algorithms were implemented in Matlab (Natick, MA, 
USA). Numerical integration of the differential equations was performed 
with compiled MEX files using numerical integrators from the SUNDIALS 
CVode package (Lawrence Livermore National Laboratory, Livermore, 
CA, USA). Absolute and relative tolerances were set to 10 -8 and 10 -9 , 
respectively. The Gaussian proposal distribution for the MCMC was based 
on an approximation to the Hessian computed using a Jacobian obtained 
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Fig. 2. Model of the JAK-STAT pathway. In this model u\ serves as driving 
input, while the total concentration of STAT (x\+X2 + 2xy) and the total 
concentration of phosphorylated STAT in the cytoplasm (^2+2x3) were 
measured. Note that the step from X4 back to x\ is associated with a delay. 

using finite differencing (H~J T J). All available priors were subsequently 
included in the Hessian approximation. After convergence, the chain was 
thinned to 10 000 samples. The SVR was computed in parallel using OpenCL 
on the GPU using a compiled MEX file. 



4 RESULTS 

To illustrate our me thod, we apply it to a model of the JAK-STAT 
signalling pathway jRaue et a/IEooilToni et a/.ll2009h . The model 
is based on a number of hypothesized steps (See Figure 2). The 
first reaction describes the activation of the erythropoietin receptor 
which subsequently phosphorylates cytoplasmic STAT (x\). Then 
phosphorylated STAT (^2) dimerises (x?,) and is imported into the 
nucleus (x 4 ). Here dissociation and dephosphorylation occurs which 
are associated with a time delay. Similar to the implementation given 
in the original paper, the driving input function was approximated 
by a spline interpolant, while the delay was approximated using a 
linear chain approximation (^5, ... ,^3) . 

, ^nucleus 



Vc 



cyto 



" (P4*13) -Pix\u\ xg =P4*1 -P4*8 



X2=p\X\U\ 



-2p2x\ 



*3 =P2X 2 -P3X3 



x 4 = 



cyto 



^nucleus 



(P3*3) -P4X4 



x 5 =P4M -P4X5 
x 6 =P4*5 -P4X6 
XI =P4X6 -P4X1 



x 9 =P4*8 -P4X9 
xio=p4X 9 -p 4 xio (19) 

xn=p 4 x m -p 4 x n 

X\2=P4X\1-P4X\2 
x\3=P4X\2-P4X\3 



In order to infer the posterior distribution data from the paper 
by Swameye etdl. [2003; http://webber.physik.uni-freiburg.de/~jeti 
/PNAS_Swameye_Data/ (dataset 1)] was used. Measured quantities 
were the total concentration of STAT (x\+X2 + 2x3) and the total 
concentration of phosphorylated STAT in the cytoplasm (^2 + 2^3), 



both reported in arbitrary units (which necessitates two scaling 
parameters). The initial cytoplasmic concentration of STAT is 
unknown while all other forms of STAT are assumed zero at the start 
of the simula t ion. G iven the data, not all parameters are identifiable 
feaue et all l20Q9h . We used uniform priors in log space for the 
kinetic parameters and a Gaussian (/x = 200 nM, a = 20 nM) for the 
initial condition. Parameter two was bounded bet ween ranges, sinc e 
this parameter was non-identifiable from the data feaue q/.Ll2009l) . 
We simulated two chains starting at different initial values up to 
one million parameter sets and assessed convergence by visually 
inspecting differences between batches of samples. 

The uncertainty in model parameters propagates as an uncertainty 
in the predicted responses of the state variables. PPDs were 
simulated for all states as well as the summations of states already 
measured. To simulate the PPDs the chain of parameter sets was 
thinned to 10000 samples using equidistant thinning. Since the 
error model in this case is additive Gaussian noise, there is no 
need to explicitly simulate measurement noise. This can be taken 
into account by multiplying the SD of the measurement by \fl 
(see Supplementary Materials for more information). An example is 
shown in Figure [3]revealing the relation between two predictions at 
different time points. For a complete overview of the PPDs for all 
states, see the Supplementary Materials. 

The relation between the PPDs of different states was explored. 
This relation between two states at the indicated time points is shown 
in more detail in both scatter plot and 2D histogram form in Figure[3] 
The former shows the actual samples from the PPD for one point in 
time. Here each dot represents a simulated value for one parameter 
set from the MCMC chain. As shown in the figure, these different 
states are often non-linearly related at specific points in time. 
The associated 2D histogram corresponds to the same information 
interpreted as probability density. Considering state 3 as observable 
and state 4 as prediction, while assuming a measurement accuracy 
of 0 = 10/a/2 for x$, it can be observed that a significant decrease in 
variance can be attained during the rise of state 3. Measuring state 
3 at the peak value however, results in a smaller variance reduction. 
A few things can be observed. In order for the measurement to be 
useful, there should be a correlation between the measurement and 
the prediction of interest. Additionally, the uncertainty in both should 
be large enough. Since all predictions of state 3 start with an initial 
condition of zero, this implies that the uncertainty at this point is 
low. Therefore, an additional measurement at t = 0 would not yield 
appreciable variance reduction which is also reflected by the fact 
that the SVR starts at a value of zero. 

In order to demonstrate the flexibility of our method it was decided 
to perform OED for a quantity that depends on the model predictions 
in a highly non-linear fashion, namely the time to peak for the 
concentration of dimerized STAT in the nucleus (state 4). The time to 
peak was computed for the state 4 prediction for each parameter set 
from the posterior parameter distribution. We assumed that all states 
except state 4 are measurable with an accuracy of a= 10/V2. As 
potential measurements we also included the two sums of states as 
measured in earlier experiments. The experiment space was sampled 
using a Monte Carlo approach, uniformly sampling the experiment 
space. 

This sampling is shown in Figure [4] where the SVR is shown for 
several combinations of two measurements. In this figure each axis 
corresponds to a potential measurement. Different model outputs 
(potential measurements) are separated using grid lines, while 
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Fig. 3. Top left: one simulated time course of state 3 superimposed on the 
PPD. Two time points are indicated with circles. Bottom left: correlation 
coefficient between states 3 and 4 and SVR of state 4 based on a measurement 
of state 3 (SVR). The relation between the two states at the indicated time 
points is shown in both scatter plot and 2D histogram form. The former 
shows the actual samples from the PPD for one point in time. Here the dots 
represent simulated values belonging to different parameter sets from the 
MCMC chain. In the histogram the colour indicates the number of samples 
in a particular region which is proportional to the probability density. 




Measurement 1 

Fig. 4. Variance reduction of the peak time of dimerized STAT (X4) with 
respect to two new measurements. (A) Each axis represents an experiment, 
where the different model outputs are numbered. Numbers 1 to 3 correspond 
to the first three states whereas 4 and 5 correspond to the sums of states on 
which the original PPD was parametrized. Note that each block on each axis 
corresponds to an entire time series. The block corresponding to experiments 
involving state 1 is shown enlarged in (B). Variance reduction is computed 
using the importance sampling method. 

the interval between each pair of lines corresponds to an entire 
time series. The colour value indicates the SVR for that specific 
experiment. Recall that the original dataset contained measurements 
of two sums of model states. These two observables correspond 
to outputs 5 and 6 in Figure [4] which indicates that additional 
measurements on these would provide very little additional variance 
reduction. 

Interestingly, performing the experimental design for two 
measurements revealed that the largest reduction in variance could 
be obtained by measuring state one at an early and late time 
point. This result underlines the benefit of being able to combine 
multiple measurements in the OED. Furthermore, the analysis 
clearly revealed that the timing of this first time point is crucial. 
However, if accurate timing is not possible in the experiment one 
could consider measuring state three and one instead. Here smaller 
reductions are attained but the timing accuracy required for a 




Fig. 5. Comparison of two methods for calculating the variance reduction. 
Variance reduction of the peak time of dimerized STAT (X4) with respect 
to two new measurements. (A) LVR. (B) Difference between the variance 
reduction computed by means of LVR and importance sampling (shown 
in Fig.|4j. 

reasonable reduction is less stringent. Additionally, we investigated 
how the bounds of the priors on the non-identifiable kinetic rates 
affected our experimental design by widening them. This revealed 
that the EVRs obtained when measuring state 2 or 3 in combination 
with state 1 were more robust (for more information see the 
Supplementary Materials). 

Since both error models in this case are Gaussian, the same 
analysis can be performed using the LVR (which for 7=1000 
samples is about 100-fold faster). The resulting sampling is shown 
in Figure \5\ Qualitatively, the results agree well with those in 
Figure [4] revealing its applicability as an initial sampling step. 
Information gained from an initial LVR sweep can subsequently be 
used to sample only relevant regions of the experiment design space. 
Another example can be found in the Supplementary Materials. 

5 DISCUSSION AND CONCLUDING REMARKS 

In this article, we have outlined a flexible method to perform 
experimental design. Here a Gaussian probability density function 
was used to model the uncertainties. Note, however, that our 
method is not restricted to such error models. In statistical 
parameter inference it is important to determine which error model 
to use for each experiment as this will define the appropriate 
likelihood function. If the likelihood function cannot be computed 
explicitl y then approxim ate Bayesian methods can provide a 
solution (iToni et fl/.Ll2009h . 

In the OED the timing of the new measurement is assumed 
instantaneous (infinitely accurate). It remains an open but relevant 
challenge to incorporate temporal inaccuracies in the current 
framework. It is expected that when timing is more error prone and 
explicitly accounted for, experiments that are only effective during 
brief time intervals will be marked as less beneficial for variance 
reduction. 

In our method, we base the experimental design on the 
expected value of a distribution of variance reductions. However, 
since the entire distribution of possible variance reductions has 
been computed one could also consider incorporating information 
regarding the accuracy of this estimate into the selection process. 
Finding a sensible trade off between EVR and its inaccuracy 
considering the prediction uncertainty remains an open topic for 
further research. 

In order to obtain the posterior distribution, the parameters 
are required to be either identifiable or restricted by means of 
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a finite prior distribution. Even for a small model , idenlt ifiabirity 
can be problematic but easily tested feaue et all |200 ( 1). Given 
a sufficient amount of data, the posterior distribution should be 
relatively insensitive to the assumed priors. It is important 
this a posteriori. One option to investigate prior depende 
vary the priors or determine the effect of a measurement on the 
assumed prior. Note though that the latter strongly depen Is on the 
initial prior, which should be chosen sufficiently wide to 
potential parameter regimes. The number of samples re 
order to get a reliable estimate is highly problem specific. MCMC 
convergenc e is hard to assess and on l y non-convergenc e can be 
diagn osed (ICalderhead and GirolamiL l201ll : ICowles an I Carlirl 



to verify 
nee is to 



cover all 
quired in 



Earn 

Once convergence has been attained, one should verify that 



s will be 



the model sufficiently describes the acquired data as EVF 
based on model predictions. 

The method is not limited to a specific family of distributions 
for the parameters and model predictions. However, stron 
distributions (such as high variance logarithmic distribut 
be problematic. The reason for this is that in such cases 



estimates from a small sample of the tail are quite unrel able and 



give a poor description of the distribution. Therefore it i 



to a posteriori visually inspect the distribution at the tine point 
determined optimal. In the case of heavy tailed distributions , it can be 
beneficial to perform a transformation of the PPD before pe rforming 
the experimental design. 

Consider performing a new measurement as illustrated earlier 
in Figure [T] The estimation of the measurement efficacy involves 
multiplying samples of the old posterior with new weight: 1 in order 
to estimate quantities for the situation after the experiment has been 
performed. When computing such a weighted average it is important 
to keep track of the quality of the estimation. When the posterior 
before and after a new experiment is very different, man) of these 
sample weights will be very low and a large fraction of the samples 
will contribute only negligibly to the estimation of the new 1 
It follows that such an estimate will be poor. We mo iitor this 
degene racy by estimating the effective sample size (ESS) defined 
below toel Moral Jgflkood) . 



ESS r = 



(2 



G(t,u(t),O k A 



(one 



=l G(t,u(t)A,0 r ) 2 

We compute a distribution of ESS values 
incorporated sample) which we characterize by its medibn 
This ESS gives a measure for the quality of the samplir 
case that the importance sampling distribution agrees 
the new posterior, it should scale linearly with the 
included samples. When the values for the ESS are very 
values obtained for the variance reduction can be inaccurate 
implies that such a measurement would be very informa 
an inferential perspective. This stems from the fact that the 
probability distribution would be much narrower. In 
it would be beneficial to perform the experiment and sub; 
redo the MCMC step in such cases (for more informatioi 1 
Supplementary Materials). 

Obtaining the PPDs as well as performing the experime|nt 
is computationally expensive. For the former, model s 
time is a primary concern which can be signific antly 



such 



by us ing compiled s imulation code [se e COPASI (IHoojs et all 



l2006h : ABC-SysBio JLieoe et a/.Ll201Cb : Potters Wheel dMaiwald 



*ly tailed 
ons) can 
variance 



sensible 



(20) 

for each 
value. 
In the 
well with 
number of 
low then 
. It also 
ive from 
updated 
a case, 
equently 
see the 



design 
mulation 
reduced 



Calderh sad. 



120081) : Sloppy Cell jBrown and SethnaL l2003h l. 
more efficient sampling methods for obtaining such 
lig h dimensional spaces are being developed ( Girolami 
" 1201 ll : iToni et all 120091) . For the experiment 
the computational burden can be divided into two 
. First is the sampling of the experiment space. Since 
experiment constitutes a dimension in experiment space, 
saimling this space for a large number of experiments 
prohibitively time consuming. In this case, it may be 
re sort to more sophisticated sampling techniques such as 
]V CMC or sequential Monte Carlo methods. One option 
to perform a fast initial sweep of the experiment space 
the LVR. Then in a subsequent step the actual SVR 
for those samples that resulted in a large LVR. For a 
of the LVR and SVR for one specific application, see 
ementary Materials. Additionally, profiling the resampling 
that the distance calculations for the error model 
trine consuming. Since this step exhibits a large degree 
the resampling step was also implemented to run 
(using OpenCL), treating the resampling for each 
simultaneously. Even on a modest GPU (NVIDIA Quadro 
resulted in considerable speedup (see Supplementary 



and Timmer 
Additionally 
posteriors in 
and 

design part, 
contributions 
each 
densely 
can become 
required to 
population 
we employ is 
by sampling 
is computed 
comparison 
the SuppL 
step revealed 
were most 
of parallelisih 
on the GPU 
MCMC 
FX 580) this 
Material). 

As a last 
of the 

approaches { 
We propo 
existing des 
parameters 
linearization 
Fernandez el 
error model 
enables the 
require 
in order to 
prediction of 
simulations, 
different 
in order to 



remark we would like to point out that if the goal 
exper ment is to discriminate between models, alternative 
>kanda and be relevant to explore. 



The authors 
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Brannmark,C. 1 

endocytosis 

framework. 
Brown,K.S. and 

many poorly 
Calderhead,B. 

systems 

1, 821-835 
Casey,F. et al. 
receptor si falling 



5 usm 1 



>ed a flexible data-based strategy for OED. Where 
gn criteria pertain to effectively constrain specific 
o r target the variance of predictions u sing model 
dCasev et all 120071: iFaller et all 120031: Rodriguez- 



this method is not limited to any specific 
3r assumption regarding the parameter distribution. It 
modeller to select specific predictions of interest that 
decreased uncertainty thereby focus the experimental efforts 
ave time and resources. Furthermore, it allows the 
interest to be any quantity that can be obtained from 
An additional strength of the method is that multiple 
mec surements can be included in the design simultaneously 
elucidate their combinatorial efficacy. 
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